Facebook Ads Data Analysis


In the following notebook, it is explored methodologies and limitations of using the Facebook Graph API to access the Facebook Ads library and perform data analysis on the 2021 Dutch General Elections and get key insights of advertisement practices

drawing

The notebook is divided in 3 main sections:

The first section discusses how to use and access Facebook ads data from a general user perspective via the Facebook Ads Library service, it is explored doing a generic search. The second section shows how to connect directly to the Graph API in developer mode and gather advertisement data directly, and thus perform data analysis. Finally, the third section presents a pilot data analysis exploring some key insights that could be useful for Consumer Protection Investigations or Compliance on Competition Law practices

Key Insights Summary

In this pilot data analysis, advertisement data has been collected on more than 8k ads using Facebook's Ad Library API. It is been searched the API by all advertisers and only include the period range around the Dutch General Elections in 2021, i.e. January to April 2021, all campaigns collected were inactive ones. We have observed that the average time of ads impressions is 4 days to the consumers, particularly, the 65+ were the most microtargeted group with ads impressions for almost a month. Interestingly, there were certain ads that microtargeted very specific groups reaching a huge audience (more than a 1M impressions) in a short amount of time (less than a day), these came from Jonge Socialisten in de PvdA, Wopke Hoekstra (CDA) and Woonbond, targeting 18-24 female, 18-24 male and 65+ male respectively. The party CDA was by far the one with more unique ads created with more than 2k, followed by smaller parties like Volt with 409 and DENK with 329. Furthermore, CDA spent up to 150,000 Euros on campaigns it wasn't the one that spent the most. Forum voor Democratie (FVD) spent up to 220,000 Euros in ads campaigns. The least microtargeted regions were Friesland and Zeeland, meaning that there were no campaigns that were intended exclusively for those regions. Marketers choose the last Friday before the election days (March 12) to launch the majority of the campaigns (3-4 days prior to voting), the ads were targeted generally in all provinces except Zeeland, Groningen and Friesland. In terms of topics, there was no special patterns found, only the mentioning of lockdown and crisis plan was new. Additionally using text similarity metrics (Levinstein distance) between the links and the pages we did not find doubtful links that would link to dubious websites. Note that some advertisers were mistakenly classified as political by Facebook's algorithms because their ad texts may contain words associated with political issues like "crisis", "environment" or "freedom".

Data acquisition and analysis were done using the Python programming language. Plotly for visualization and word frequencies by using the Natural Language Toolkit. The ads data and code used for the analysis can be found in the Github repo of this work. Any comments, please contact us at p.hernandezserrano@maastrichtuniversity.nl


Date: Jun 2021
Author: Pedro V Hernández Serrano
License: Attribution 4.0 International (CC BY 4.0)


1. Accessing the Facebook API as General user and Marketers


The Facebook Ads Library is an open repository of all the ads and campaigns, active and inactive from many countries that Facebook runs. This repository is exposed as a service with a search engine interface that will help users to easily look up for ads, campaigns, Facebook pages, etc. in the Facebook ads database, this service is naturally connected to the Facebook Graph API. The search interface has two filters Country and Ad Type.

Limitations:

Interface example:

drawing

Once the search is defined one can browse all the ads related to the keywords entered. Additional filters will appear. Active/Inactive, Advertiser, Platform and Impressions by date.

drawing

One can see the details of each ad, but important to note that the details only show the ad identifier, the link to the page, and the ad content, any information about the demographics of the audiences that this ad was shown NOT presented by Facebook

drawing

Using the API is impossible to do the last query since Facebook is only making public the parameter POLITICAL_AND_ISSUE_ADS therefore the rest of the ads are not accessible via the API

drawing

The following EU technical in ads transparency report documents the use of Facebook Graph API and how was used for analysing general elections ads https://adtransparency.mozilla.org/eu/methods/

There are a number of uses by accessing the Facebook Ads Library service, normally, marketers will get inspiration or impact in different campaigns worldwide. But also a general user can look back to certain ads that have been seen in the past to get details about the products that were offered. The database is huge, and in principle, having a suspicious Facebook Page id or a particular keyword combination one could aim to collect evidence of unfair or illegal practices in advertising

2. Accessing the Facebook API as Developer


The Facebook Graph API is the primary way for apps to read and write to the Facebook social graph. The official documentation is found in APIs and SDKs docs here. The Graph API has many uses, from creating and publish a game to analyze friends networks, naturally contains the Ads that Facebook publishes but only a limited amount of those, only limited to social issues and politics. In order to access and use the API, you need to gain access to the Facebook Ads Library API at https://www.facebook.com/ID and confirm your identity for running (or analysing) Ads About Social Issues, Elections or Politics, which involves receiving a letter with a code at your official account and sending picture identification to Facebook. Basically one has to be registered as an official Facebook developer, and this permission can actually take from one day to weeks.

The Facebook API has also a nice user interface (for developers) called the Graph API Explorer, which allows the developer or analyst to quickly generate access tokens, get code samples of the queries to run, or generate debug information to include in support requests. Here more info.

Requirements

  1. Register as a developer at developers.facebook.com
  2. Go to Graph API explorer and create an app
  3. Having a new app ID. Create a Token for the new app in the UI
  4. Define the Graph API Node to use: ads_archive

An example query that can be retrieved from the Graph API Explorer is the following

        ads_archive?access_token=[TOKEN]
        &ad_type=POLITICAL_AND_ISSUE_ADS
        &ad_active_status=ALL
        &fields=ad_creation_time%2Cad_creative_body2Cpage_name%2Cdemographic_distribution
        &limit=100
        &ad_reached_countries=NL
        &search_terms=.

There are of course a number of clients that can perform API calls, in the following pilot data analysis we are using a Python implementation

Interface example:

3. Discovering Key Insights Analysing Ads Data

Facebook Ads - Data Collection


Max Woolf's facebook-ad-library-scraper it's the best out-the-box solution (as of early 2021) to to retrive ads data from a Python client, since it requires minimal dependencies.

!pip3 install requests tqdm plotly

!python fb_ad_lib_scraper.py

fb_ads.csv: The raw ads and their metadata.
fb_ads_demos.csv: The unnested demographic distributions of people reached by ads, which can be mapped to fb_ads.csv via the ad_id field.
fb_ads_regions.csv: The unnested region distributions of people reached by ads, which can be mapped to fb_ads.csv via the ad_id field.

The following notebook extracts over 8000 inactive ads by querying "stem" filtering therefore the ones related to the lections and setting a manageable limit for the API.

Extracting and reading the data

Facebook Ads - Descriptive Statistics


The following section is focused on the statistical methodologies for describing the key insights of the Facebook Ads data, it is important to note that no hypothesis testing is performed in the current pilot data analysis, meaning that no correlation analysis or causal effects are studied. The purpose of this section is to understand the key insights of the data and therefore ask questions about ads practices.

Number of unique ads

The number of unique ads considered for this pilot data analysis is:

Ads Impressions Period

The ads run following the configuration set up that was made by the campaign creator, normally the conditions are that the campaign will run until the funds are exhausted, and there is a declared min and max per ad. Following this logic, some ads can run for hours, but some others for days, here we find the average, max, min and count of each ad

The period of the dataset that was extracted from the API is 2019-09, to 2021-07. It is taken this period of time, since one can't specify the period in the API call, the date filter has to be done at the data level. For this analysis, we are taking 2021-01-01 to 2021-05-01.

The campaigns are online in average 4 days and 7 hours, with a standard deviation of 5 days, the max campaign duration is 43 days.

The longest campaign was the following ad from the DENK party, mentioning "coronacrisis". This ad was online for 43 days.

drawing

Here the actual URL to the archive:
https://www.facebook.com/ads/library/?active_status=all&ad_type=political_and_issue_ads&country=NL&q=161990739027528&sort_data[direction]=desc&sort_data[mode]=relevancy_monthly_grouped&search_type=keyword_unordered&media_type=all

As we could see, the ads would normally link you to the party page. However, some mischievous advertisers and marketers sometimes would use the ads to promote certain links or websites, this website would link to a scam page or sometimes viruses. One way to check the authenticity of the links is to actually compare the link URL to the name of the original page. Following this approach if the page name is D66 then the associated link is d66.nl or similar, the same for not partisan websites like KiesKlimaat and its link kiesklimaat.nl. Obviously, there would be a number of false positives, nevertheless is a good indication of dubious links.

It is used the Levenshtein distance to measure the similarity between the link and the page name, the closer to 0 the similar the names.

There are no immediate insights of dubious links in the Ads before the Dutch general elections. However, a way to verify this is to export this list and manually fact-checking authenticity.

The following example shows how a link is similar to the page name and therefore is having a distance closer to 0.

Amount of Ads Rate Change

What are the peaks and rises on the ads shown in the last weeks before the election? It would be interesting to note that some advertisers and pages suddenly invested the whole budget way too early, leaving fewer impressions towards the end.

There are some advertisers that are obviously non political, we won't include them. Facebook mislabel them since inlcudes keywords related to social issues. It also appears that Wopke Hoekstra went "all in" towards the end.

Number of Ads by Advertiser

Who is actually the page/party that is putting more ads on Facebook?
To answer this, we simply make a count of unique ads by each one of the pages, it is interesting to note that certain campaigns of the same party run in parallel.

The party CDA was by far the one with more unique ads created with more than 2k (counting CDA together with the personal Wopke Hekstra page), followed by smaller parties like Volt with 409 and DENK with 329

Total Spent by Advertiser

Who is actually the big spender in this game? The ads details contain the max and min budget by ad impression that each ad is budgeted, we have calculated the median of those points and accumulate the amount in Euros per page/party.

Even though FvD created only 94 unique ads, they were the big spenders, potentially meaning that they actually put less effort on the creation and simply more budget per ad. This could only mean that they have a more consistant message (at least from the ads).

Demographic Groups Distribution

The Facebook Ads Library does not include any information of the actual people that were exposed to the ads, this would be an actual problem for Facebook nowadays. Instead, the API provides aggregated statistics of the demographics groups that were displayed by each ad. For example, the ad number 785007302430176 was displayed 60% to men and 40% to women, as well as 80% to 65+ and 20%, to '18-24' group. We could therefore ask different questions such as. Which groups are longer exposed?

Distributions by Group

Moreover, we could explore which groups tend to be more microtargeted. Meaning that a campaign creator would only focus on a particular combination of demographical attributes. For instance "Female 18-24 group".

The only outstanding patterns are that there is a higher tendency to micro-target females, we can see that in each peak of the age group, that the female group received 100% of the ads.

No matter the age group, normally the campaigns would run proportionally, the youngest group 13-17 still received some ads, even thought they don't vote and the 65+ are the highest microtargeted

Microtargeted Ads

Given the above observations, we could explore further the more clever marketing tricks, this microtargeting that can reach audiences in the most optimal way, like for instance targeting one specific demographic group. To gather those we get only the ads are only displayed by region, gender and age quickly exhausting the budget.

Top 3 ads that were specifically targeted to demographic groups in a quick period of time exhausting the budget reaching maximum audience.

drawing

Here the actual URLs to the archive:

Top ad by Region

Similarly to the demographic groups, one can aggregate the ads impressions by Dutch province.

The reading is that 8 out of 100 Ads shown in Noord-Brabant aren't shown anywhere else. On the other hand almost all the ads shown in Friesland aren't intended to be be seen specifically for those regions.

It is clear to see that during the election days the campaigns were finalized, and a small handful of them continued after the election for some reason. There is a clear peak on Friday (March 12) before the election, where the majority of the ads were displayed, also right before the election days.

Ads Topics


Performing N-grams analysis on the text of the ads. It is considered 1-grams to 4-grams terms using Dutch and English dictionaries, finally terms frequency and inverse terms frequency are compared.

The n-grams analysis does not necessarily present key insights, mostly we can see generic terms, what would be interesting to analyse is the consistency of the advertisement and the actual campaign plans. Not only that, but this workflows can be adapted to be iterable so that we could see what topics are shown by demographic groups and regions.


There is also some interesting work conducting similar analysis, for example, Roberto Rocha from CBC News reports how 35,000 political ads on Facebook were analysed in Canadian elections. His main focus was to discuss how rules could affect advertising practices. Also, Ondrej Pekacek has created a very awesome monitor of ads in Czech elections. He aims to create an automated workflow, which would inform analysts covering the political communication and financing of Czech elections. Example dashboard

There are some other dozens of projects focused in the US, which are very interesting, but mostly focused on voter fraud conspiracies which is more related to scams in ads. Ultimately, we have discussed the limitations and piloted a data analysis of how the Facebook Ads Library can help on bringing transparency to advertisement practices and to democracy.

In this notebook, we have focused on the Dutch General Elections of 2021 finding not surprising insights, however, what is more, important is to open the discussion whether Facebook should be forced to open their Ads Library to access all types of ads and not only the "Social Issues, Elections or Politics" category.


Date: Jun 2021
Author: Pedro V Hernández Serrano
License: Attribution 4.0 International (CC BY 4.0)